Top-tagging: A Method for Identifying Boosted Hadronic Tops 
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A method is introduced for distinguishing top jets (boosted, hadronically decaying top quarks) 
from light quark and gluon jets using jet substructure. The procedure involves parsing the jet cluster 
to resolve its subjets, and then imposing kinematic constraints. With this method, light quark or 
gluon jets with pr — 1 TeV can be rejected with an efficiency of around 99% while retaining up to 
40% of top jets. This reduces the dijet background to heavy ti resonances by a factor of ~ 10, 000, 
thereby allowing resonance searches in ti to be extended into the all-hadronic channel. In addition, 
top-tagging can be used in ti events when one of the tops decays semi-leptonically, in events with 
missing energy, and in studies of ^-tagging efficiency at high pr- 



The Large Hadron Collider (LHC) is a top factory. The 
millions of top quarks it produces will provide profound 
insights into the standard model and its possible exten- 
sions. Most of the tops will be produced near threshold, 
and can be identified using the same kinds of techniques 
applied at the Tevatron - looking for the presence of a 
bottom quark through 6-tagging, identifying the W bo- 
son, or finding three jets whose invariant mass is near 
m t . However, some of the top quarks produced at the 
LHC will be highly boosted. In particular, almost ev- 
ery new physics scenario that addresses the hierarchy 
problem will include new heavy particles which decay 
to tops (such as KK gluons in Randall- Sundrum mod- 
els, squarks in supersymmetry, top primes in little Higgs 
models, etc.). If their masses are even a factor of a few 
above the top mass, the tops that they produce will de- 
cay to collimated collections of particles that look like 
single jets. In this case, the standard top identification 
techniques may falter: 6-tagging is difficult because the 
tracks are crowded and unresolvable, the W decay prod- 
ucts are not always isolated from each other or from the 
b jet, and the top jet mass may differ from mt due to an 
increased amount of QCD radiation. 

In most studies of ti resonances, emphasis is placed 
on the channel in which one top decays semi-leptonically 
(to an electron or muon, a neutrino, and a b jet) and 
the other hadronically [l|, Q • This avoids having to con- 
front the large dijet background to all-hadronic ti . How- 
ever, these studies need to assume that the lepton can be 
isolated, which often excludes the electron channel, and 
that at least one b jet is tagged, which is difficult at high 
Pt 0- The hard muon tag alone already discards 90% 
of the ti events. So one would like to be able to use 
the all-hadronic channel without 6-tags. In this paper, 
we introduce a practical and efficient method for tagging 
boosted hadronically-decaying tops. 

A top quark's dominant decay mode is to a b quark 
and a W boson with the W subsequently decaying to 
two light quarks. The three quarks normally appear as 
jets in the calorimeter, but for highly boosted tops these 
jets may lie close together and may not always be inde- 
pendently resolved. For example, a zoomed-in lego plot 
of a typical top jet is shown in Figure [T] It displays 
energy deposited in an ideal calorimeter versus pseudo- 



FIG. 1: A typical top jet with a p T of 800 GeV at the LHC. 
The three subjets after top-tagging are shaded separately. 



rapidity, ry, and azimuthal angle, 4>. The three quark 
jets show up clearly by eye, but it is easy to see how 
the number of jets identified by conventional clustering 
would be highly variable and strongly dependent on the 
jet-resolution parameter. This is the inherent difficulty 
with extrapolating the techniques that work for slower 
tops, where the decay products are widely separated, to 
the boosted case. 

The natural direction for finding boosted tops is to 
look into subjet analysis and other measures of the en- 
ergy distribution in the events. A recent ATLAS note Q 
explored the possibility by cutting on the jet mass and 
the y cu t variables associated with the kx algorithm. They 
achieved an efficiency of 45% for top-tagging at pr = 1 
TeV with 1 in 20 background jets getting through. Such 
efficiencies are not strong enough to filter ti events from 
the enormous dijet background [2l|. 

The key to efficient top-tagging is in isolating features 
of QCD which control the background from features par- 
ticular to the top quark. As can be seen in Figure [TJ 
boosted top events look like single jets with three re- 
solvable subjets in a small region of the calorimeter. 
These subjets are separated by angular scales of order 
~ 2m,t/pTi and so remain distinguishable from one an- 
other up to pt's of roughly 2 TeV for a calorimeter cell 
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size of 0.1. In QCD, on the other hand, a typical high- 
er jet starts as a single hard parton, which subsequently 
cascades into a high multiplicity of soft and collinear par- 
ticles. Most of these particles cannot be resolved by the 
real calorimeter, as they tend to fall into a single cell 
or a set of adjacent cells. In order to look like a de- 
cayed top quark, a hard parton must at least undergo two 
branchings at somewhat large angles and energy sharings, 
which is relatively rare, as we will see. The primary task, 
then, is to isolate events with three hard, nearby subjets. 
Subsequently, we may exploit the full 3-body kinemat- 
ics of top decay to construct additional discriminating 
variables. 

In order to avoid the pitfalls mentioned above for fixed- 
size jet clustering, we first cluster an event using a large 
jet radius to capture all of the potential substructure, and 
then iteratively decluster each jet to search for subjets. 
Similar ideas have been employed by by Butterworth et 
al. to extract substructure in Higgs jets [|| and W jets @, 
8], and part of our algorithm is an adaptation of their 
method. 

The top-tagging algorithm is as follows: 

• First, particles are clustered into jets of size R. For 
this step, we use the Cambridge-Aachen (CA) al- 
gorithm 0, This iterative procedure begins 
with all four- vectors in an event, as defined by the 
energy deposits in the calorimeter. It then finds 



the pair which is closest in AR = \/Arf 
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merges it into a single four- vector, and then re- 
peats. The procedure ends when no two four- 
vectors have AR < R. 

• Next, each jet in the event (for tt this would be 
one of the hardest two) is declustered, to look for 
subjets. This is done by reversing each step in the 
CA clustering, iteratively separating each jet into 
two objects. The softer of the two objects is thrown 
out if its pt divided by the full jet px is less than 
a parameter 8 P , and the declustering continues on 
the harder object. 

• The declustering step is repeated until one of four 
things happens: 1) both objects are harder than 
S p ; 2) both objects are softer than S p ; 3) the two 
objects are too close, \Ar]\ + \A(f>\ < S r , where S r 
is an additional parameter; or 4) there is only one 
calorimeter cell left. In case 1), the two hard ob- 
jects are considered subjets. In cases 2), 3), and 4), 
the original jet is considered irreducible. 

• If an original jet declusters into two subjets, the 
previous step is repeated on those subjets (with 5 p 
still defined with respect to the original jet's px) 
resulting in 2, 3, or 4 subjets of the original jet. 
The cases with 3 or 4 subjets are kept, the 4th rep- 
resenting an additional soft gluon emission, while 
the 2 sub jet case is rejected. 

• With these 3 or 4 subjets in hand, additional kine- 
matic cuts are imposed: the total invariant mass 



should be near nit, two subjets should reconstruct 
mw, and the W helicity angle should be consistent 
with a top decay, as described below. 

For our particular implementation, we simulate dijet 
events and tt events in the standard model at the LHC 
using pythia v.6.415 11]. In order to simulate the 
resolution of the ATLAS or CMS calorimeters, parti- 
cles in each event are combined into square bins of size 
An = A<f) = 0.1, which are interpreted as massless four- 
vector "particles" and inputted into the clustering rou- 
tine. For jet clustering, we employ the CA algorithm 
as implemented in fast jet v . 2 . 3 . 1 [12j . Because more 
highly boosted tops will be more collimated, we correlate 
the jet clustering parameter R, the event's scalar Et, and 
the two clustering parameters S p and 8 r as follows: for 
E T > 1000, 1600, 2600 GeV, we take R = 0.8, 0.6, 0.4, 
d p = 0.10,0.05,0.05 and S r = 0.19,0.19,0.19 respec- 
tively. Then we demand that the jets be hard by putting 
a cut on the jet pr scaled by the event's scalar Et- 
Pt > 0.7=lp. Both jets must also satisfy the absolute 
constraints pt > 500 GeV and \n\ < 2.5 to be considered 
for analysis. 

Next, we perform the subjet decomposition, demand- 
ing 3 or 4 subjets, as described above. For jets with 
Pt < 1000 GeV, we then ask that the invariant mass 
of the sum of the subjet four-vectors be within 30 GeV 
of the top mass (145-205 GeV) and that there exist two 
subjets which reconstruct the W mass to within 15 GeV 
(65-95 GeV). Harder jets will have broader mass distri- 
butions, due to increased radiation from QCD. Thus, if a 
jet has pt > 1000 GeV, we shift the upper ranges of top 
and W mass cuts to pt/20 + 155 GeV and p T /40 + 70 
GeV respectively. Finally, we demand that the W helic- 
ity angle satisfy cos9h < 0.7, as we now explain. 

The helicity angle is a standard observable in top de- 
cays, used to determine the Lorentz structure of the top- 
W coupling [l3[. It is defined as the angle, measured 
in the rest frame of the reconstructed W, between the 
reconstructed top's flight direction and one of the W de- 
cay products. Normally, it is studied in semi-leptonic top 
decays, where the charge of the lepton uniquely identi- 
fies these decay products. In hadronic top decays there 
is an ambiguity which we resolve by choosing the lower 
Pt subjet, as measured in the lab frame. (Other choices 
are possible and make little difference on the final effi- 
ciencies.) For top jets, the distribution is basically flat: 
since the W decays on-shell, its decay products are al- 
most isotropically distributed in the W rest frame. In 
contrast, for light quark or gluon jets, the distribution 
diverges (at the parton level) as 1/(1 — cos 6^ ) . This 
corresponds to a soft singularity in the QCD matrix ele- 
ments for emitting an additional parton. Example distri- 
butions are shown in Figure [H The qualitative features 
we understand analytically at the parton level are clearly 
visible after showering and hadronization. Other observ- 
ables sensitive to the soft singularity are possible [f| , and 
will give similar signal/background enhancements. 
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FIG. 2: Distribution of helicity angle for top jets, gluon jets, 
and light quark jets for pr > 700 GeV. These distributions 
are after the subjet requirement, top mass cut, and W mass 
cut have been imposed. 



FIG. 3: The efficiencies for correctly tagging a top jet (it), 
and mistagging a gluon jet (e 9 ) or light quark jet (e q ). The 
quark and gluon efficiences are of order 1% and have been 
scaled in the plot by a factor of 10 for clarity. 



To check the efficacy of this method, we calculate the 
efficiency for correctly tagging a top jet, e*, and the effi- 
ciencies for mistagging light-quark or gluon jets as tops, 
t q and e g respectively. These are shown in Figure [3] 
There are a few important qualitative observations one 
can make about this plot. For very large pt the top- 
tagging efficiency goes down. This is because these jets 
are so highly boosted that the calorimeter can no longer 
distinguish the subjets. As pr goes below 900 GeV, the 
top-tagging efficiency also decreases. This is due to some 
of the top jets becoming too fat for the initial R = 0.8 
clustering. (This somewhat tight choice was made to 
suppress the mistag efficiency, which grows faster than 
the top-tag efficiency with increasing R.) Examples of 
the sequential effects of the individual cuts are shown in 
Table [J] The clustering R's and kinematic cuts can be 
varied to increase the tagging and mistagging efficiencies, 
as desired for a particular S/y/B goal. 





Pt (GeV) 


subjets 


m t 


mw 


e h 




500-600 


0.56 


0.43 


0.38 


0.32 




1000-1100 


0.66 


0.52 


0.44 


0.39 




1500-1600 


0.40 


0.33 


0.28 


0.25 




500-600 


0.135 


0.045 


0.027 


0.015 


c a 


1000-1100 


0.146 


0.054 


0.032 


0.018 




1500-1600 


0.083 


0.038 


0.025 


0.015 




500-600 


0.053 


0.018 


0.011 


0.005 


eg 


1000-1100 


0.063 


0.023 


0.013 


0.006 




1500-1600 


0.032 


0.015 


0.010 


0.006 



TABLE I: Incremental efficiencies for top, gluon, and light 
quark jets passing the subjets, invariant mass, and helicity 
angle cuts for jets in three different pr windows. 

One important concern is whether the Monte Carlo 
generates the tt and dijet distributions correctly. Jet 



substructure in particular is strongly dependent on as- 
pects of the parton shower (both initial state and final 
state radiation), the underlying event, and the model of 
hadronization. To approach these issues, we redid our 
analysis using samples generated with various shower pa- 
rameters, with the "new" p^-ordered dipole shower in 
PYTHIA, and with herwig v.6.510 [IJ]. We find a 50% 
variation in e q and e g and a negligible change in e t . We 
also ran pythia with multiple interactions and initial 
state radiation turned off, individually and together. Ef- 
fects on e q and e g are at the 10% level or less, indicating 
that the QCD jet substructure relevant for top-tagging 
is mostly controlled by final state parton branchings. 

One might also be worried about whether, since we 
are looking at multi-(sub)jet backgrounds, it would be 
important to include full matrix element calculations. 
However, since the events are essentially two jet events, 
the substructure is due almost entirely to collinear ra- 
diation, which the parton shower should correctly re- 
produce [l5[. To confirm this, we have also simulated 
background events using MADGRAPH v . 4 . 2 . 4 [ll| . Using 
events with 2 — * 4 matrix elements in a region of phase 
space where 1 parton recoils against 3 relatively collinear 
partons, we repeated our analysis without showering or 
hadronization. The resulting mistag efficiencies were con- 
sistent with those from the pythia study to within 10%, 
which provides justification for both the parton shower 
approximation and the robustness of our algorithm. 

One possible way to verify the Monte Carlo predic- 
tions for jet substructure would be to use data directly. 
Although boosted tops are not produced at the Tevatron, 
there are plenty of hard dijet events. These could be used 
to test the mistag efficiency, tune the Monte Carlo, and 
optimize jet-tagging parameters for the LHC. In addition, 
at the LHC, the efficiency of the top-tagging algorithm 
can be calibrated by comparing the rate for tt events 
where one top decays semi-leptonically with the rate in 
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dijet/rr invariant mass M (GeV) 



FIG. 4: Effect of top jet tag on standard-model ti and dijet 
distributions at the LHC. Both the t and i decay hadroni- 
cally, and no fe-tagging is used. With top-tagging, a strongly- 
produced ti resonance (not shown) would stand out clearly 
over background in this channel. 

the all-hadronic channel. The background rejection effi- 
ciency can also be studied by looking in side-bands where 
the jet invariant mass is not close to m t . 

Top-tagging may be particularly useful in the search 
for new physics in resonances. In the all-hadronic chan- 
nel, the biggest background for ti is dijets, so in Figured] 
we show the dijet and tt invariant mass distributions be- 
fore and after top-tagging both jets. It is evident that 
after top-tagging, the dijet sample is reduced to the level 
of the ti sample. As an example application, in certain 
Randall- Sundrum models [UGll KK gluons decay dom- 
inantly to ti. It has been shown that if one can isolate 
the ti events, the resonance will stand out as a clean peak 



over the standard model ti background P, 0, [lj| . Since 
top-tagging can reduce the dijet background to the size 
of the ti background, tt resonance searches can be done 
in the all-hadronic channel for resonances up to a few 
TeV. 

There are many applications for top-tagging besides ti 
resonances searches. For example, a common new physics 
signal is ti pairs in association with missing energy [20l ]. 
This may happen, for instance, in supersymmetry when 
heavy top squark pairs decay to highly boosted tops and 
neutralinos. Top-tagging can not only reduce the stan- 
dard model backgrounds in this context, but it can also 
help distinguish top jets from light quark jets in any sig- 
nal event, which may be helpful in studying the flavor 
structure of the new physics. In addition, top-tagging 
could potentially be applied in searches for single top 
events where exactly one top jet is required. Finally, 
our technique could be used as a handle for measuring 
6-tagging efficiency at high px- 

In conclusion, we have demonstrated that it is possi- 
ble to distinguish highly energetic top quarks from stan- 
dard model backgrounds at the LHC. With efficiencies 
e t ~ 40% and e q ~ e g ~ 1%, top-tagging is better than 
6-tagging at high px- Top jets can now be considered 
standard objects for event analysis at the LHC, as b jets 
are at the Tevatron. 
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